In this paper, we investigate the problem of computing a multiway join in oneround of MapReduce when the data may be skewed. We optimize on communicationcost, i.e., the amount of data that is transferred from the mappers to thereducers. We identify join attributes values that appear very frequently, HeavyHitters (HH). We distribute HH valued records to reducers avoiding skew byusing an adaptation of the Shares~\cite{AfUl} algorithm to achieve minimumcommunication cost. Our algorithm is implemented for experimentation and isoffered as open source software. Furthermore, we investigate a class ofmultiway joins for which a simpler variant of the algorithm can handle skew. Weoffer closed forms for computing the parameters of the algorithm for chain andsymmetric joins.
展开▼